Search CORE

19 research outputs found

Learning Generative Models across Incomparable Spaces

Author: Alvarez-Melis David
Bunne Charlotte
Jegelka Stefanie
Krause Andreas
Publication venue
Publication date: 01/01/2019
Field of study

Generative Adversarial Networks have shown remarkable success in learning a distribution that faithfully recovers a reference distribution in its entirety. However, in some cases, we may want to only learn some aspects (e.g., cluster or manifold structure), while modifying others (e.g., style, orientation or dimension). In this work, we propose an approach to learn generative models across such incomparable spaces, and demonstrate how to steer the learned distribution towards target properties. A key component of our model is the Gromov-Wasserstein distance, a notion of discrepancy that compares distributions relationally rather than absolutely. While this framework subsumes current generative models in identically reproducing distributions, its inherent flexibility allows application to tasks in manifold learning, relational learning and cross-domain learning.Comment: International Conference on Machine Learning (ICML

arXiv.org e-Print Archive

Repository for Publications and Research Data

DSpace@MIT

Learning Graph Models for Retrosynthesis Prediction

Author: Barzilay Regina
Bunne Charlotte
Coley Connor W.
Krause Andreas
Somnath Vignesh Ram
Publication venue
Publication date: 01/01/2021
Field of study

Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to identify precursor molecules that can be used to synthesize a target molecule. A key consideration in building neural models for this task is aligning model design with strategies adopted by chemists. Building on this viewpoint, this paper introduces a graph-based approach that capitalizes on the idea that the graph topology of precursor molecules is largely unaltered during a chemical reaction. The model first predicts the set of graph edits transforming the target into incomplete molecules called synthons. Next, the model learns to expand synthons into complete molecules by attaching relevant leaving groups. This decomposition simplifies the architecture, making its predictions more interpretable, and also amenable to manual correction. Our model achieves a top-1 accuracy of

53.7\%

, outperforming previous template-free and semi-template-based methods

arXiv.org e-Print Archive

Repository for Publications and Research Data

Neural Unbalanced Optimal Transport via Cycle-Consistent Semi-Couplings

Author: Alvarez-Melis David
Bunne Charlotte
del Castillo Jacobo Sarabia
Gut Gabriele
Lübeck Frederike
Pelkmans Lucas
Publication venue
Publication date: 30/09/2022
Field of study

Comparing unpaired samples of a distribution or population taken at different points in time is a fundamental task in many application domains where measuring populations is destructive and cannot be done repeatedly on the same sample, such as in single-cell biology. Optimal transport (OT) can solve this challenge by learning an optimal coupling of samples across distributions from unpaired data. However, the usual formulation of OT assumes conservation of mass, which is violated in unbalanced scenarios in which the population size changes (e.g., cell proliferation or death) between measurements. In this work, we introduce NubOT, a neural unbalanced OT formulation that relies on the formalism of semi-couplings to account for creation and destruction of mass. To estimate such semi-couplings and generalize out-of-sample, we derive an efficient parameterization based on neural optimal transport maps and propose a novel algorithmic scheme through a cycle-consistent training procedure. We apply our method to the challenging task of forecasting heterogeneous responses of multiple cancer cell lines to various drugs, where we observe that by accurately modeling cell proliferation and death, our method yields notable improvements over previous neural optimal transport methods

arXiv.org e-Print Archive

BBF RFC 105: The Intein standard - a universal way to modify proteins after translation

Author: Ahlmann-Eltze Constantin
Bayer Philipp
Beaudouin Joel
Bunne Charlotte
Büscher Magdalena
Di Ventura Barbara
Eils Roland
Gleixner Jan
Horn Max
Huhn Anna
Klughammer Nils
Krämer Stephen
Kreft Jakob
Neugebauer Julia
Schäfer Elisabeth
Schmelas Carolin
Schmitz Silvan
Waldhauer Max
Wehler Pierre
Publication venue
Publication date: 17/03/2015
Field of study

This Request for Comments (RFC) proposes a new standard that allows for easy and flexible cloning of intein constructs and thus makes this technology accessible to the synthetic biology community

DSpace@MIT

Neural Optimal Transport for Dynamical Systems: Methods and Applications in Biomedicine

Author: Bunne Charlotte
Publication venue: ETH Zurich
Publication date: 01/01/2023
Field of study

Modeling dynamical systems is a core subject of many scientific disciplines as it allows us to predict future states, understand complex interactions over time, and enable informed decision-making. Biological systems in particular are governed by dynamical processes, with their inherently complex and constantly changing patterns of interactions and behaviors. Single-cell biology has revolutionized biomedical research, as it allows us to monitor such systems at unprecedented scales. At the same time, it presents us with formidable challenges: While single-cell high-throughput methods routinely produce millions of data points, they are destructive assays, such that the same cell cannot be observed twice nor profiled over time. Since many of the most pressing questions in the field involve modeling the dynamic responses of heterogeneous cell populations to various stimuli, such as therapeutic drugs or developmental signals, there is a pressing need to provide computational methods that can circumvent that limitation and re-align these unpaired measurements. Optimal transport (OT) has emerged as a major opportunity to fill in that gap in silico as it allows us to reconstruct how a distribution evolves, given only access to distinct snapshots of unaligned data points. Classical OT methods, however, do not generalize to unseen samples. Yet, this is crucial when, for example, predicting treatment responses of incoming patient samples or extrapolating cellular dynamics beyond the measured horizon. By harnessing the theoretical constructs of OT, this thesis explores and develops neural static and dynamic optimal transport schemes for elucidating the intricate dynamics of biological populations. It encapsulates an array of algorithmic frameworks, with contributions to both the understanding and prediction of population dynamics: First, we derive static neural optimal transport schemes capable of learning a map between the unpaired distributions of unperturbed and perturbed cells. These models excel at predicting single-cell responses to varying perturbations, such as cancer drug screens, and generalize the inference of treatment outcomes to unobserved cell types and patients. This has significant implications for personalized medicine, as it allows for the prediction of treatment responses for new patients in large-scale clinical studies. Second, we explore dynamic neural optimal transport formulations that leverage the connections of OT to partial differential equation and gradient flows through the Jordan-Kinderlehrer-Otto scheme, as well as stochastic differential equations and optimal control through the diffusion Schrödinger bridge. These methods then serve as robust tools for reconstructing stochastic and continuous-time dynamics from marginal observations, allowing us to dissect fine-grained and time-resolved cellular mechanisms. This thesis connects a variety of seemingly unrelated concepts into a unified framework, and the presented methodologies offer a computational and mathematical foundation for modeling of cellular dynamics. This provides new avenues to understand cellular heterogeneity, plasticity, and response landscapes. Such neural parameterizations of static and dynamic OT that allow for out-of-sample inference lay the groundwork for exciting opportunities to make novel biological discoveries, infer personalized therapies from single-cell patient samples, and push the boundaries of regenerative medicine

Repository for Publications and Research Data

Proximal Optimal Transport Modeling of Population Dynamics

Author: Bunne Charlotte
Cuturi Marco
Krause Andreas
Meng-Papaxanthos Laetitia
Publication venue
Publication date: 14/12/2021
Field of study

Consider a population of particles evolving with time, monitored through snapshots, using particles sampled within the population at successive timestamps. Given only access to these snapshots, can we reconstruct individual trajectories for these particles? This question arises in many crucial scientific challenges of our time, notably single-cell genomics. In this paper, we propose to model population dynamics as realizations of a causal Jordan-Kinderlehrer-Otto (JKO) flow of measures: The JKO scheme posits that the new configuration taken by a population at time t+1 is one that trades off a better configuration for the population, in the sense that it decreases an energy, while remaining close (in Wasserstein distance) to the previous configuration observed at t. Our goal in this work is to learn such an energy given data. To that end, we propose JKOnet, a neural architecture that computes (in end-to-end differentiable fashion) the JKO flow given a parametric energy and initial configuration of points. We demonstrate the good performance and robustness of the JKOnet fitting procedure, compared to a more direct forward method

arXiv.org e-Print Archive

Repository for Publications and Research Data

Supervised Training of Conditional Monge Maps

Author: Bunne Charlotte
Cuturi Marco
Krause Andreas
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2022
Field of study

Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures

(\mu,\nu)

, a parameterized map

T_\theta

that can efficiently map

\mu

onto

\nu

. In many applications, such as predicting cell responses to treatments, pairs of input/output data measures

(\mu,\nu)

that define optimal transport problems do not arise in isolation but are associated with a context

c

, as for instance a treatment when comparing populations of untreated and treated cells. To account for that context in OT estimation, we introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable, using several pairs of measures

(\mu_i, \nu_i)

tagged with a context label

c_i

. CondOT learns a global map

\mathcal{T}_{\theta}

conditioned on context that is not only expected to fit all labeled pairs in the dataset

\{(c_i, (\mu_i, \nu_i))\}

, i.e.,

\mathcal{T}_{\theta}(c_i) \sharp\mu_i \approx \nu_i

, but should also generalize to produce meaningful maps

\mathcal{T}_{\theta}(c_{\text{new}})

when conditioned on unseen contexts

c_{\text{new}}

. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately

Repository for Publications and Research Data

Multi-Scale Representation Learning on Proteins

Author: Bunne Charlotte
Krause Andreas
Somnath Vignesh Ram
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2021
Field of study

Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein –HoloProt– connecting surface to structure and sequence. The surface captures coarser details of the protein, while sequence as primary component and structure –comprising secondary and tertiary components– capture finer details. Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level. We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification).On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the classification task, it achieves a performance close to the top-performing model while using 10x fewer parameters. To improve the memory efficiency of our construction, we segment the multiplex protein surface manifold into molecular superpixels and substitute the surface with these superpixels at little to no performance loss

Repository for Publications and Research Data

Multi-Scale Representation Learning on Proteins

Author: Bunne Charlotte
Krause Andreas
Somnath Vignesh Ram
Publication venue: Curran Associates, Inc.
Publication date: 01/01/2021
Field of study

arXiv.org e-Print Archive

Repository for Publications and Research Data

Recovering Stochastic Dynamics via Gaussian Schr\"odinger Bridges

Author: Bunne Charlotte
Cuturi Marco
Hsieh Ya-Ping
Krause Andreas
Publication venue
Publication date: 11/02/2022
Field of study

We propose a new framework to reconstruct a stochastic process

\left\{\mathbb{P}_{t}: t \in[0, T]\right\}

using only samples from its marginal distributions, observed at start and end times

0

and

T

. This reconstruction is useful to infer population dynamics, a crucial challenge, e.g., when modeling the time-evolution of cell populations from single-cell sequencing data. Our general framework encompasses the more specific Schr\"odinger bridge (SB) problem, where

\mathbb{P}_{t}

represents the evolution of a thermodynamic system at almost equilibrium. Estimating such bridges is notoriously difficult, motivating our proposal for a novel adaptive scheme called the GSBflow. Our goal is to rely on Gaussian approximations of the data to provide the reference stochastic process needed to estimate SB. To that end, we solve the \acs{SB} problem with Gaussian marginals, for which we provide, as a central contribution, a closed-form solution and SDE-representation. We use these formulas to define the reference process used to estimate more complex SBs, and show that this does indeed help with its numerical solution. We obtain notable improvements when reconstructing both synthetic processes and single-cell genomics experiments

arXiv.org e-Print Archive